Skip to content

Conversation

@paulbatum
Copy link
Member

Description

I generated a set of tests from the existing agent tools samples we have. They fall into two categories:

  • simple tests that use a single tool
  • some more complex tests that combine multiple tools under the multitool branch

I have observed a high passrate of these tests on gpt-4o. For other models, I see a higher failure rate. From my investigations so far, these seem to surfacing real issues that can occur between different variations of tools and models.

print(f"Fibonacci(10) = {result}")
"""

vector_store = openai_client.vector_stores.create(name="CodeAnalysisStore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after test is done, you need to delete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is deleted on line 156:
openai_client.vector_stores.delete(vector_store.id)

It does have a problem of not being deleted if test fails, but as discussed in other comment, this is a bigger issue across the whole test suite.

print("✓ Code file analysis completed")

# Cleanup
project_client.agents.delete_version(agent_name=agent.name, agent_version=agent.version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized our agents SDK has the following problem and same here. I might want to do this as a new PR:
If assertion fail, delete won't be execute. Perhaps we can have declarator as a wrapper of the test that has a try-catch-final. In final, delete the agent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I noticed this, I agree it should be done in another PR if we are going to do it. One thing I will say, is that this current behavior is handy when running locally because when a test fails, I still have the agent. I can then debug further by sending it more requests, including from the playground. I think whichever way we pick, it should be consistent across the tests.

This commit introduces extensive test coverage for agent tools functionality,
validating various tool types and their combinations across different scenarios.

New test coverage:
- Individual tool tests: file search, code interpreter, function tools, AI search,
  web search, bing grounding, MCP, and image generation
- Multi-tool integration tests: combinations of file search, code interpreter, and functions
- Conversation-based tool tests: multi-turn interactions with various tools
- Model verification tests: basic validation across different models
- Async test support: parallel execution testing for AI search

Test organization:
- tests/agents/tools/test_agent_*.py: Individual tool validation
- tests/agents/tools/multitool/*: Multi-tool integration scenarios
- tests/agents/test_model_verification.py: Model compatibility checks

Infrastructure updates:
- Enhanced servicePreparer with connection IDs for bing, AI search, and MCP
- Sample file improvements (agent naming, typo fixes)
- Comprehensive README documentation for agent tools tests

Note: One test (code interpreter file download) currently fails due to a known
service limitation where the container file download API does not support token
authentication. This will be resolved once the service adds support.
…oyment

- Image model deployment is now configurable via AZURE_AI_PROJECTS_TESTS_IMAGE_MODEL_DEPLOYMENT_NAME
- Test automatically checks if the image model deployment exists in the project using deployments.get()
- Gracefully skips the test if the image model is not available (instead of hardcoded region checks)
- Added image_model_deployment_name to servicePreparer for proper sanitization in recordings
- Defaults to 'gpt-image-1-mini' if environment variable is not set

This allows the test to run across different regions/projects with varying image model availability.
This test will be maintained in a separate branch for specialized testing.
@paulbatum paulbatum force-pushed the developer/pbatum/agent-tools-tests branch from 703e1dd to 0101ab2 Compare November 22, 2025 02:11
@paulbatum paulbatum marked this pull request as ready for review November 22, 2025 02:13
@paulbatum paulbatum requested a review from trrwilson as a code owner November 22, 2025 02:13
Copilot AI review requested due to automatic review settings November 22, 2025 02:13
Copilot finished reviewing on behalf of paulbatum November 22, 2025 02:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive test coverage for Azure AI Projects agent tools functionality, generated from existing samples. The tests validate various agent capabilities including web search, file search, code interpreter, function tools, MCP integration, and more complex multi-tool scenarios.

Key changes:

  • Addition of 11 single-tool test files covering individual agent capabilities (web search, MCP, image generation, function tools, file search, code interpreter, Bing grounding, AI search)
  • Addition of 5 multi-tool test files testing combinations of tools (file search + function, file search + code interpreter, etc.)
  • New environment variables in test configuration for connection IDs and settings
  • Fixed typo in sample documentation ("Bear" → "Bearer")

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/ai/azure-ai-projects/tests/test_base.py Added sanitization patterns for new test environment variables (Bing, AI Search, MCP connections, image model deployment)
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_web_search.py Test for WebSearchPreviewTool with location-based queries
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_tools_with_conversations.py Tests for using function, file search, and code interpreter tools within conversations
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_mcp.py Tests for MCP tool with public and authenticated GitHub API access
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_image_generation.py Test for ImageGenTool with base64 image validation
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_function_tool.py Tests for custom function tools including multi-turn conversations and context-dependent follow-ups
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_file_search_stream.py Test for FileSearchTool with streaming responses
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_file_search.py Tests for FileSearchTool including negative test for unsupported file types and multi-turn conversations
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_code_interpreter.py Tests for CodeInterpreterTool including simple math and file generation (latter skipped due to known bug)
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_bing_grounding.py Tests for BingGroundingAgentTool with URL citations
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_ai_search_async.py Async parallel test for AI Search question answering
sdk/ai/azure-ai-projects/tests/agents/tools/test_agent_ai_search.py Synchronous test for AI Search (skipped in favor of faster async version)
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_multitool_with_conversations.py Test for file search and function tools in same conversation
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_code_interpreter_function.py Tests combining file search, code interpreter, and function tools (3-4 tools)
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_function.py Tests for file search + function tool combinations across various workflows
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_code_interpreter.py Tests for file search + code interpreter tool combinations
sdk/ai/azure-ai-projects/tests/agents/tools/multitool/test_agent_code_interpreter_and_function.py Tests for code interpreter + function tool combinations
sdk/ai/azure-ai-projects/tests/agents/tools/init.py Empty init file for test module
sdk/ai/azure-ai-projects/samples/agents/tools/sample_agent_mcp_with_project_connection.py Fixed typo: "Bear" → "Bearer" in authentication header comment
sdk/ai/azure-ai-projects/.env.template Added environment variables for Bing, AI Search, and MCP connection testing

print(f"Response: {response_text[:300]}...")

assert len(response_text) > 50
response_lower = response_lower = response_text.lower()
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable assignment error: The line response_lower = response_lower = response_text.lower() has a duplicate assignment. It should be just response_lower = response_text.lower().

Suggested change
response_lower = response_lower = response_text.lower()
response_lower = response_text.lower()

Copilot uses AI. Check for mistakes.
vector_store = openai_client.vector_stores.create(name="SalesDataStore")
print(f"Vector store created (id: {vector_store.id})")

txt_file = BytesIO(txt_content.encode("utf-8"))
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import placement: BytesIO is used on line 65 but is only imported later at line 386 (within a function). For better code organization and to follow Python conventions, the import should be moved to the top of the file with other imports.

Copilot uses AI. Check for mistakes.
# pylint: disable=too-many-lines,line-too-long,useless-suppression
# ------------------------------------
# Copyright (c) Microsoft Corporation.
# Licensed-----------------------------------------------------------------------------------------
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright header is incomplete - it shows "# Licensed-----------------------------------------------------------------------------------------" instead of "# Licensed under the MIT License." This appears to be a copy-paste error with excess dashes.

Suggested change
# Licensed-----------------------------------------------------------------------------------------
# Licensed under the MIT License.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants